Sparse Matrix Multiplication Using UPC
نویسندگان
چکیده
Partitioned global address space (PGAS) languages, such as Unified Parallel C (UPC) have the promise of being productive. Due to the shared address space view that they provide, they make distributing data and operating on ghost zones relatively easy. Meanwhile, they provide thread-data affinity that can enable locality exploitation. In this paper, we are considering sparse matrix multiplication which is an important operation for many scientific and engineering applications. Recently, several different high-performance algorithms and libraries have been developed for that operation. However, in this work, we were able to take advantage of one of the advanced features provided by UPC, which is the fact that it is a globally addressable memory model. Due to that feature, using UPC in this operation would enable threads to read or write data from any other thread’s memory directly without any inter-process communication as the case with MPI. Our goal is to evaluate the performance of both parallel programming models based on experimental evaluation of sparse matrix multiplication. The comparative analysis will consider conceptual complexity and execution time. It will be shown that UPC which supports distributed shared memory model has a great productivity advantage over message passing when sparse matrix multiplication problems are considered.
منابع مشابه
Data-Parallel Language for Correct and Efficient Sparse Matrix Codes
Data-Parallel Language for Correct and Efficient Sparse Matrix Codes by Gilad Arnold Doctor of Philosophy in Computer Science University of California, Berkeley Professor Rastislav Bodík, Chair Sparse matrix formats encode very large numerical matrices with relatively few nonzeros. They are typically implemented using imperative languages, with emphasis on low-level optimization. Such implement...
متن کاملSparse Matrix Multiplication on CAM Based Accelerator
Sparse matrix multiplication is an important component of linear algebra computations. In this paper, an architecture based on Content Addressable Memory (CAM) and Resistive Content Addressable Memory (ReCAM) is proposed for accelerating sparse matrix by sparse vector and matrix multiplication in CSR format. Using functional simulation, we show that the proposed ReCAM-based accelerator exhibits...
متن کاملEfficient Sparse Matrix-Matrix Multiplication on Multicore Architectures∗
We describe a new parallel sparse matrix-matrix multiplication algorithm in shared memory using a quadtree decomposition. Our preliminary implementation is nearly as fast as the best sequential method on one core, and scales well to multiple cores.
متن کاملcient Sparse Matrix - Matrix Multiplication on Multicore Architectures ⇤
We describe a new parallel sparse matrix-matrix multiplication algorithm in shared memory using a quadtree decomposition. Our implementation is nearly as fast as the best sequential method on one core, and scales quite well to multiple cores.
متن کاملAn Efficient Phone N-Gram Forward-Backward Computation Using Dense Matrix Multiplication
The forward-backward algorithm is commonly used to train neural network acoustic models when optimizing a sequence objective like MMI and sMBR. Recent work on lattice-free MMI training of neural network acoustic models shows that the forward-backward algorithm can be computed efficiently in the probability domain as a series of sparse matrix multiplications using GPUs. In this paper, we present...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007